Stable Cooperative Solutions for the 
: Iterated Prisoner's Dilemma 

>■ 

^ '. Ethan Akin 

^ ; Mathematics Department 

g : The City College 

43 ■ 137 Street and Convent Avenue 

New York City, NY 10031, USA 

October, 2012 

ov 

■ Abstract 

There exists a class of Markov strategies for the iterated Prisoner's 
Dilemma which, long term, assure the cooperative payoff for a pair of 
rational players. When they both use these strategies the cooperative 
level is achieved by each. Neither player can benefit by moving unilat- 
erally to any other strategy. In fact, if a player moves unilaterally to 
a strategy which reduces the opponent's payoff below the cooperative 
level then his own payoff is reduced below it as well. Thus, if we limit 
attention to the long term payoff, then these good strategies effectively 
stabilize cooperative behavior. 
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1 Introduction 



The Prisoner's Dilemma is a two person game which provides a simple model 
of a disturbing social phenomenon. 

In the general symmetric two-person-two-strategy game each of the two 
players, X and Y, has a choice between two strategies, c and d. Thus, there 
are four outcomes which we list in the order: cc, cd, dc, dd, where, for example, 
cd is the outcome when X plays c and Y plays d. Each then receives a payoff. 
Both receive R at cc and P at dd. At cd and dc, the c player gets S and the 
d player gets T. Thus, we can describe the payoffs to X with the 2x2 chart: 
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Alternatively we can describe the payoff vectors for each player 

(r\ (r\ 



Sy = 



s 

T 



Sv = 



T 
S 



;i.2) 



X can use a mixed strategy when he randomizes, adopting c with prob- 
ability p c and d with the complementary probability 1 — p c . Of course, the 
probability p c lies between and 1 with the extreme values corresponding to 
the pure strategies c and d. 

When S = T then the payoffs to the two players are equal no matter 
what strategy each chooses. The game becomes a coordination game of 
getting to a location with the best joint payoff, which is only interesting 
when S = T > max(R, P) or R = P>S = T, and no communication is 
allowed. When S^Twe choose the labeling so that T > S. 

Davis (1983) and Strafhn (1993) provide clear introductory discussions of 
the elements of game theory. For a lovely description of biological applica- 
tions, see Sigmund (1993). 

We will focus on the Prisoner's Dilemma, where 



T > R > P > S 



and 



2R > T + S. 



(1.3) 
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The strategy c is cooperation. When both players cooperate they each receive 
the reward for cooperation (= R). The strategy d is defection. When both 
players defect they each receive the punishment for defection (= P). But 
if one player cooperates and the other does not then the defector receives 
the large temptation payoff (= T) while hapless cooperator receives the very 
small sucker's payoff (= S). The condition 2R > T + S says that the reward 
for cooperation is larger than the players would receive from sharing equally 
the total payoff of a cd or dc outcome. Thus, the maximum total payoff 
occurs uniquely at cc and that location is a strict Pareto optimum which 
means that at every other outcome at least one player does worse. The 
cooperative outcome cc is clearly where the players "should" end up. If they 
could negotiate a binding agreement in advance of play they would agree to 
play c and each receive R. However, the structure of the game is such that at 
the time of play, each chooses a strategy in ignorance of the other's choice. 

This is where it gets ugly. In game theory lingo, the strategy d strictly 
dominates strategy c. This means that whatever Y's choice is, X receives a 
larger payoff by playing d than by using c. In the array fll.lj) each number 
in the d row is larger than the corresponding number in the c row above it. 
Hence, X chooses d and for exactly the same reason Y chooses d and so they 
are driven to the dd outcome with payoff P for each. Having firmly agreed 
to cooperate, X hopes that Y will stick to the agreement because then X can 
obtain the large payoff T by defecting. Furthermore, if he were not to play 
d then he risks getting S when Y defects. All the more reason to defect as X 
realizes Y is thinking the same thing. 

The payoffs are often stated in money amounts or in years reduced from 
a prison sentence (the original "prisoner" version). But it is important to 
understand that the payoffs are really in units of utility. That is, the ordering 
in f ll.3p is assumed to describe the order of desirability of the various out- 
comes to each player when the full ramifications of each outcome are taken 
into account. Thus, if X is induced to feel guilty at the dc outcome then 
the payoff to X of that outcome is reduced. Adjusting the payoffs is the 
classic way of stabilizing cooperative behavior. Suppose prisoner X walks 
out of prison, free after defecting, having consigned Y, who played c, to a 
20 year sentence. Colleagues of Y might well do X some serious damage. 
Anticipation of such an event considerably reduces the desirability of dc for 
X, perhaps to well below R. If X and Y each have threatening friends then it 
is reasonable for each to expect that a prior agreement to play cc will stand 
and so they each receive R. However, in terms of utility this is no longer 
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a Prisoner's Dilemma. In the book which originated modern game theory, 
Von Neumann and Morgenstern (1944), the authors developed an axiomatic 
theory of utility which allows us to make sense of such arithmetic relation- 
ships as the second inequality in (II. 3p . We need not consider this here but 
the reader should remember that the payoffs are numerical measurements of 
desirability. 

This two person collapse of cooperation can be regarded as a simple model 
of what Garret Hardin (1968) calls the tragedy of the commons. This is a 
similar sort of collapse of mutually beneficial cooperation on a multi-person 
scale. 

In attempting to devise a theoretical approach which will avert this 
tragedy, attention has focused on repeated play. X and Y play repeated 
rounds of the same game. For each round the players' choices are made in- 
dependently but each is aware of all of the previous outcomes. The hope is 
that the threat of future retaliation will rein in the temptation to defect in 
the current round. 

There is a dismal result which applies when the number of rounds is 
known. Suppose it is 100. On the last round the past history is irrelevant. 
We are back to the original Prisoner's Dilemma and the logic of domination 
leads to mutual defection. But knowing this, as both players do, there is 
no benefit to cooperating on the 99 th round. That is, among the strategies 
which play d on the 100 th each is dominated by one which plays d on the 
99 th as well. A backward induction leads to constant defection. The relative 
domination used here feels less convincing than the domination argument 
for a single round, but it is hard to argue against its logic. We can fix the 
problem by ruling out knowledge of the length of play. However, the result 
does suggest that as the players observe the terminus of their interactions 
approaching, they become more likely to defect. 

Robert Axelrod devised a tournament in which submitted computer pro- 
grams played against one another. The results are described and analyzed 
in his landmark book, The evolution of cooperation (1984). The winning 
program, Tit-for-Tat, was submitted by game theorist Anatol Rapaport. It 
consists in playing in each round the strategy used by the opponent in the 
previous one. 

Tit-for-Tat is an example of a Markov strategy which bases its response 
entirely on outcome of the previous round. See, for example, Nowak (2003) 
Chapter 5. With the outcomes listed in order as cc, cd, dc, dd, a Markov 
strategy for X is a vector p = (p ls p 2 , p 3 , p 4 ) = (p cc , p cd , Pdc, Pdd) where p z is the 
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probability of playing c when the outcome z occurred in the previous round. 
If Y uses strategy vector q = (qi, q 2 , 53, 54) then the Markov response is 
(Qcc, Qcd, Qdc, Qdd) = (qi, Q3, Q2, 94) and the successive outcomes follow a Markov 
chain with transition matrix given by: 
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We use the switch in numbering from the Y strategy g to the Y response 
vector because switching the perspective of the players interchanges cd and 
dc. This way the "same" strategy for X and for Y is given by the same 
probability vector. For example, Tit-for- Tat for both X and Y is given by p = 
q = (1, 0, 1, 0) and but the response vector for Y is (1, 1, 0, 0). Repeat is given 
by P = q = (1, 1, 0, 0) with response vector for Y (1, 0, 1, 0) . This strategy 
just repeats the previous play regardless of what the opponent did. The 
strategy Cooperate, = (1,1,1,1), always plays c while Defect, = (0,0,0,0), 
always plays d. We will refer to this symmetry of the game as the XY switch. 

We describe some elementary facts about finite Markov chains, see, e.g. 
Karlin and Taylor (1975) Chapter 2. 

A Markov matrix like M is a non-negative matrix with row sums equal to 
1. That is, the column vector 1 is a right eigenvector with eigenvalue 1. For 
such a matrix we can represent the associated Markov chain as movement 
along a directed graph with vertices the states, in this case, cc, cd, dc, dd, and 
with a directed edge from the i th state Zi to the j th state Zj when M^- > 0, that 
is, when we can move from Z; t to Zj with positive probability. In particular, 
there is an edge from Z; to itself iff the diagonal entry Mjj is positive. 

A path in the graph is a state sequence z 1 , ...,z n with n > 1 such that 
there is an edge from z % to z l+1 for i — 1, n — 1. A set of states / is called 
a closed set when no path which begins in I can exit /. For example, the 
entire set of states is closed and for any z the set of states accessible via 
a path which begins at z is a closed set. / is closed iff M^- = whenever 
Zi & I and Zj ^ /. In particular, when we restrict the chain to a closed set 
/, the associated submatrix of M still has row sums equal to 1. A minimal, 
nonempty, closed set of states is called a terminal set. A state is called 
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recurrent when it lies in some terminal set and transient when it does not. 
The following facts are easy to check. 

• A nonempty, closed set of states / is terminal iff whenever z iy Zj e / 
there exists a path from Zj to Zj. 

• If I is a terminal set and zj G / then there exists Zj e / with an edge 
from to Zj. 

• Distinct terminal sets are disjoint. 

• Any nonempty, closed set contains at least one terminal set. 

• From any transient state there is a path into some terminal set. 

A distribution v on the set of states is a non-negative column vector 
normalized by v T l = 1, i.e. a probability distribution on the set of states. 
Given an initial distribution v° the Markov process evolves in discrete time 
via the equation 

(v n+1 ) T = (v n ) T -M. (1.5) 

In our game context, the initial distribution is given by the initial plays, pure 
or mixed, of the two players. If X uses initial probability p c and Y uses q c 
then 

( p c q c 

o Pc(l - Qc) 

V = 

(1-Pc)?c 
\(l-Pc)(l-9c 

Then vf is the probability that outcome Zi occurs on the n th round of play. 
A distribution v is stationary when it satisfies v T M = v T . That is, it is a 
left eigenvector with eigenvalue 1. From Perron- Frobenius theory (see, e.g., 
Karlin and Taylor (1975) Appendix 2) it follows that if / is a terminal set 
then there is a unique stationary distribution v with Vi > iff % e i\ That is, 
the support of v is exactly /. In particular, if the eigenspace of M associated 
with the eigenvalue 1 is one dimensional then there is a unique stationary 
distribution and so a unique terminal set which is the support of the station- 
ary distribution. The converse is also true and any stationary distribution v 
is a mixture of the vj's where vj is supported on the terminal set J. This 



(1.6) 
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follows from the fact that any stationary distribution v satisfies v j = for all 
transient states and so is supported on the set of recurrent states. Hence, 
the following are equivalent in our 4x4 case. 

• There is a unique terminal set of states for the process associated with 
M. 

• There is a unique stationary distribution vector for M. 

• The matrix M' = M — I has rank 3. 

We will call M convergent when these conditions hold. For example, when 
all of the probabilities of p and q lie strictly between and 1 then all the 
entries of M given by (jl.4p are positive and so the entire set of states is the 
unique terminal state and the positive matrix M is convergent. 

In the convergent case the sequence of the Cesaro averages -£"=0 M' 
converges to the matrix lv T . In particular, regardless of the initial distribu- 
tion, the sequence of averages of the outcome distributions converges to v. 
That is, 

Limn^ v* = v. (1.7) 

n 

Hence, using the payoff vectors from (jl.2p the long run average payoffs for X 
and Y converge to 

s x = v T S x , s Y = v T Sy. (1.8) 

In the non-convergent case the long term payoffs depend on the initial 
distribution. Suppose there are exactly two terminal sets I and J with sta- 
tionary distribution vectors vj and v j supported on / and J, respectively. 
For any initial distribution v° there are probabilities pi and pj = 1 — pi of 
entering and so terminating in / or J, respectively. In that case, the long 
term payoffs are given by 

S X = V T Sx, S Y = V T Sy With V = pjVi +pjVj. (1.9) 

This extends in an obvious way when there are more terminal sets. 

In our Prisoner's Dilemma case, we will call a strategy vector p agreeable 
when pi — 1 and firm when p^ = 0. That is, an agreeable strategy always 
responds to cc with c and a firm strategy always responds to dd with d. If 
both p and q are agreeable then {cc} is a terminal set for the Markov matrix 
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M given by flHD and so v T = (1,0,0,0) is a stationary distribution with 
fixation at cc. If both p and q are firm then {dd} is a terminal set for M 
and v T = (0,0,0,1) is a stationary distribution with fixation at dd. Any 
convex combination of agreeable strategies (or firm strategies) is agreeable 
(resp.firm). 

Tit-for-Tat p = (1,0,1,0) and Repeat p = (1,1,0,0) are each agree- 
able and firm. The same is true for any mixture of these If both X and Y 
use Tit-for-Tat then the outcome is determined by the initial play. Initial 
outcomes cc and dd lead to immediate fixation. Either cd or dc results in 
period 2 alternation between these two states, {cd, dc} is another terminal 
set with stationary distribution (0, |, |, 0). If any positive mixture of the Re- 
peat strategy is used by either player then eventually fixation at cc or dd is 
achieved. There are then only two terminal sets instead of three. The period 
2 alternation described above illustrates why we used the Cesaro limit, i.e. 
the limit of averages, in (jl.7p rather than the limit per se. 

If both players use Repeat then M = I; Each state comprises a terminal 
set and fixation occurs after the initial play. 

A program for a player consists of a strategy vector p together with an 
initial play p c (= the probability of using c on the initial play). This bears 
the same relation to a strategy vector as an initial value problem does to the 
associated ordinary differential equation. 

We should remark that Rapaport's Tit-for-Tat program consists of the 
agreeable strategy that we are calling Tit-for-Tat together with c as initial 
play, i.e. with p c = 1. In general, an agreeable program is an agreeable 
strategy together with c as initial play. 

By applying recent work by Press and Dyson (2012) we will show that, at 
least in the limited context of long term payoffs, there are strategies which 
solve the problem of the iterated Prisoner's Dilemma. 

Theorem 1.1 Assume that X uses the Tit-for-Tat program. 

(1) If Y chooses an agreeable program then the outcome sequence is imme- 
diately fixed at cc and, a fortiori, the long term payoffs satisfy sx = 
Sy = R- 

(2) There does not exist a program for Y which when played against the 
Tit-for-Tat will yield sy > R- 

(3) For any program for Y sx = sy and so if sy = R then sx = R as well. 
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Theorem 1.2 There exists a class S of agreeable Markov strategies for the 
Prisoner's Dilemma with the following properties. 
Assume that X chooses a strategy from §. 

(1) If Y chooses a strategy from S as well then the associated Markov matrix 
M is convergent and sx = sy = R- 

(2) There does not exist a program for Y which when played against the § 
strategy will yield Sy > R. 

(3) If Y uses any program such that sy = R then sx = R as well. 

The first result in Theorem II .11 is obvious. Any pair of agreeable programs 
immediately fix at cc. In particular, this is true for Tit-for-Tat programs. 

If X announces the intention to use an S strategy then Y can adopt any 
agreeable strategy such that the associated M is convergent. Then fixation 
at cc is the only terminal class and the cooperative payoff sx = sy = R 
follows, independent of the initial plays. If X and Y play c on the initial 
round then, as with Tit-for-Tat, fixation at cc occurs immediately. 

Against an S strategy for X, Y can use Tit-for-Tat, which is not in S, but 
which yields a convergent Markov matrix when played against a strategy in 
S. In the absence of such prior information it is best for each player to use 
an S strategy. 

More significant are results (2) and (3). Y cannot obtain a payoff against 
X which is better than the cooperative payoff R. Against a strategy in S 
it is possible for Y to play so that sy > sx and sx < R- But in that case 
sy < R as well. It is the absolute payoff which matters to Y and not the 
comparison with what X receives. This provides the incentive to move Y to 
a joint cooperative payoff position. 

Definition 1.3 A Markov strategy p is called good if it is agreeable and 
whenever Y chooses a strategy such that sy > R then sy = sx = R- 

The good strategies like Tit-for-Tat and those in § serve to stabilize the 
cooperative payoff for both players. 

The only real problem with Tit-for-Tat is the delicacy caused by the non- 
convergence of the associated Markov matrix. Because of the dependence on 
the initial plays, noise in the system, might throw the sequence of outcomes 
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into a lower payoff terminal set. The strategies in S lead to convergent matri- 
ces and so the long term results are independent of initial play. This means 
that when such strategies are used, the system can recover from the effects 
of noise. At cc the system is fixed. From other outcomes there occur paths 
through the lower payoff transient states before fixation at cc is achieved. 

In the next section we will describe the Zero Determinant Strategies in- 
troduced by Press and Dyson and use them to obtain the results on good 
strategies described above. In Section 3 we consider the evolutionary game 
dynamics among such strategies. Finally, in Section 4, we extend the Press- 
Dyson analysis to provide a parametrization for all Markov strategies and 
we use this to find additional good strategies which are not of the Zero De- 
terminant type. 

2 Zero Determinant Strategies 

We begin with a bit of trickery which will allow us to deal with Markov 
matrices which are not convergent. 

Call a strategy p lazy when at least three of the four equations p\ = 
l,p 2 = 1,P3 = 0,p4 = are satisfied. For example, the strategy is Repeat 
when all four hold. If X uses a strategy such that p\ — P2 — 1 then then he 
always plays c when he used it on the last round. Hence, regardless of what 
strategy Y adopts the set {cc, cd} is a closed set for the graph associated with 
the Markov matrix M. Similarly, if p 3 = p A = then {dc, dd} is a closed set 
regardless of Y's strategy choice. 

Lemma 2.1 (a) Assume that X adopts a strategy p which is not lazy. Let 
q be a strategy for Y with Markov matrix M when q is played against p. 
Assume that I is a terminal set for M and that vj is the stationary distri- 
bution with support I. Let q be a strategy vector for Y which uses the same 
response probability as that of q for Zj e J and which uses a response prob- 
ability strictly between and 1 when Zj I. Let M be the Markov matrix 
for q against p. The matrix M is convergent with unique terminal set I and 
with stationary distribution Vj. 

(b) Assume instead that X adopts a lazy strategy p = (pi, 1,0,0) with 
Pi < 1. Let q be a strategy for Y, and M be the Markov matrix for q against 
p. The set C = {dc, dd} is a closed set for M. Assume that I C C is a 
terminal set of states for M and that v/ is the stationary distribution with 
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support I. Let q and M be defined as in (a). The matrix M is convergent 
with unique terminal set I and stationary distribution vector Vj. 

Proof: The probability of moving from state Zi to state Zj depends 
only on the values of the response vectors for p and q to the state Zj. For 
Zi & I these are unchanged and so I is still a terminal state for M and v/ is 
the stationary distribution for M supported on /. We are left with showing 
that M is convergent, i.e. that / is the only terminal set. It suffices to show 
that for any Zj / there is a path in the M graph which begins at Zj and 
which enters I. 

Now let q" be a strategy vector for Y with < q\ < 1 for all i and let M" 
be the Markov matrix for q" against p. If Zj / then for any state z/, there 
is an edge from Zj to z^ for M iff there is such an edge for M". 

(a) First, assume that not both p± — l,p 2 — 1 and that not both p 3 = 
0,p4 = 0. Because p\ < 1 or p 2 < 1, from at least one outcome z in {cc, cd} 
there is an edge to an outcome in {dc, dd} for M" . Since q\ < 1 for all i there 
is an edge from z to both outcomes in {dc, dd}. Similarly, p 3 > or p 4 > 
and q\ > for all i implies there is an edge from an outcome in {dc, dd} to 
both outcomes in {cc,cd}. Thus, every state is accessible from every other 
and {cc, cd, dc, dd} is the unique terminal for M". Hence, for any state z £ I 
there is for M" a path sequence z°, z 1 , z n with z = z° and z n E I and 
z l $l I for % < n. This is also a path sequence for M and so every z (jL I is 
transient for M. Thus, the terminal set / for M is unique. 

Now assume that p\ — p 2 — 1. This means that for any strategy for Y 
the set {cc, cd} is closed. Because p is not lazy we have p 3 > and p^ > 0. 
So for any strategy for Y there is an edge from dc into {cc, cd} and an edge 
from dd into this closed set as well. Thus, dc and dd are always transient 
and in particular, when Y uses q we see that / C {cc,cd}. For M" there 
are edges in both directions between the elements of {cc, cd}. If one of these 
states is not in I then edge from it to the other is in M as well and so this 
state, in addition to dc and dd, is transient. Hence, again / is the unique 
terminal set for M. 

In the final case, p 3 = p 4 = and so Pi,p 2 < 1 we get that {dc, dd} is 
closed and cc, cd are transient for any strategy for Y. The proof that I is the 
unique terminal set for M is similar to the above case. 

(b) If p = (pi, 1, 0, 0) then C = {dc, dd} is closed as above. Since p± < 1 
it follows that there is an edge from cc to either dc or dd for any Y strategy. 
Hence, cc is always transient. Because cd /, q 3 > implies that with 
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respect to M there is an edge from cd to dc or to the transient state cc. 
Hence, cd is transient for M as well. Thus, if I = C it is the unique terminal 
set for M. If z G C \ I then / = {z'} — C \ {z}. Because z G" / its response 
probability for q is strictly between and 1. Hence, there is an edge from z 
to z' for M. Thus, z is transient for M as well and I = {z'} is the unique 
terminal set for M. 
□ 



Remark: The full result of (a) fails for lazy p. In the case of Repeat with 
p = (1, 1, 0, 0) there is always a terminal set in {cc, cd} and one in {dc, dd} 
regardless of Y's strategy If p = (pi, 1,0,0) then when g 3 = fixation at 
cd is another terminal set. In (b) we showed that if pi < 1 then we can 
eliminate the terminal set {cd} using the q strategy. However, we cannot 
eliminate all the terminal sets in {dc, dd}, keeping only {cd}. 



We now give a brief reprise of some of the amazing new results from Press 
and Dyson (2012). 

For X and Y strategy vectors p and q the Markov matrix is given by 
equation ( II. 4p . Now for f = (f 1 , f 2 , fa, f$) define 



£>(p,q,f) 



det 



f-l+Piqi -l+Pi -1 + gi fi^ 

p 2 q s -I+P2 <?3 /2 

P3Q2 P3 -1 + 52/3 

\ Pm Pi 94 fa J 



(2.1) 



Observe that the second and third columns depend just on p and on q, 
respectively. We define 



P 



f-l+pi^ 
-I+P2 

P3 

\ ^ J 



q 



-i + g 2 
V 94 / 



(2.2) 



We will call p and q the X and Y Press-Dyson vectors associated with 
the X strategy vector p and the Y strategy q, respectively. 
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Proposition 2.2 Assume X and Y play p and q with Markov matrix M. 
The quantity D(p, q, 1) is nonzero i/f M is convergent. In that case for any 
f 

v r f = £>(p,q,f)/Z>(p,q,l). (2.3) 
with v is the unique stationary distribution vector for M. 

Proof: When M is convergent the probability vector v is the unique left 
eigenvector of M normalized by v T l = 1. Thus, with M' = M — I, v T M' = 
0. Let Adj(M.') be the adjugate matrix, that is, the matrix obtained by 
transposing the matrix of signed minors. By Cramer's rule 

Adj (M')M' = tfet(M')/ = 0. (2.4) 

Thus, each row of Adj(NL') is a multiple of v T . Furthermore, the inner 
product of the fourth row of Adj(M.') with the vector f is the determinant of 
the matrix obtained from M' by replacing the fourth column by the column 
vector f . Adding the first column of this matrix to the second and third 
columns does not affect the determinant and so we see that this inner product 
is D(p,q, f). 

In the non- convergent case M' has rank less than 3 and the adjugate itself 
is the zero matrix. So D(p, q, f) is zero for all f . However, in the convergent 
case the rank is 3. Since the columns of M' sum to zero, omitting any of 
the four columns yields a linearly independent set of three. This implies that 
none of the rows of Adj(M.') is identically zero. In particular, the fourth row 
is a nonzero multiple of the probability row vector v T . Hence, its dot product 
with 1, which is D(p,q, 1) is nonzero. Thus, v T is D(p, q, 1) _1 times the 
fourth row. It follows that D(p, q, f)/£)(p, q, 1) is the dot product v T f. 

□ 

Recall that the long term payoff to X, denoted sx, is v T Sx and similarly 
sy = v T Sy. Consequently, with / = aSx + (3Sy + 7I we obtain from 
equation (I2.3P 

as x + (3s Y + 7 = D(p,q,o;S x + /3Sy + 7l)/ J D(p,q,l). (2.5) 

Definition 2.3 A strategy vector p for X (or q for Y) is called a Zero 
Determinant Strategy (hereafter a ZDS) if for some real numbers a, (3, 7, p = 
aS x + /3S Y + 7I (resp. q = /3S X + «Sy + 7IJ. 
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The importance of the Zero Determinant Strategies comes from the fol- 
lowing 

Theorem 2.4 Assume that p and q are strategy vectors for X and Y with 
Markov matrix M. If for some real numbers a, /3, 7 p = = aSx + (3Sy + 7I 
then the Press-Dyson Equation 

as x + f3s Y + 7 = 0. (2.6) 

is satisfied for any initial plays by X and Y. 

Proof: In the case when M is convergent, the Press-Dyson equation 
follows from Proposition 12.21 and equation (12. 5p because the determinant 
vanishes when two columns agree. It holds trivially when p = and so 
a = (3 = 7 = 0. 

Now suppose there are two terminal sets I and J. Applying equation 
(11.91) it will suffice to prove 

avJSx + /3vJS Y + 7 
avJSx + /3vJSy + 7 

because then we can multiply the first equation by pi, the second by pj and 
add to get equation (12.61) . 

Now assume that p is not lazy. By Lemma 12.11 there exist strategies 
q/ and qj with convergent matrices Mj and Mj respectively when played 
against p and so that v/ and vj are the stationary distributions for Mj and 
Mj, respectively. From (jl.8p and (12. 6 p in these convergent cases we obtain 

(im 

The proof when there are more than two terminal sets is completely 
analogous. 

The proof will be completed in Proposition 12.61 below when we deal with 
the single possibility when a ZDS, other than Repeat, is lazy. 
□ 

If the X and Y players switch strategies then by the symmetry of the game 
their payoffs switch. Notice how this works. Let Switch : M 4 — > R 4 be defined 
by Switch(xi, X2, X3, X4) = (x\, 23, X2, £4). Notice that Switch interchanges 
the vectors Sx and Sy . If X uses p and Y uses q then recall that the response 
vectors are p and Switch(q). So if X uses q and Y uses p the response vectors 



(2.7) 



14 



are q = Switch(Switch(q)) and Switch(p). The new Markov matrix is 
obtained from M by transposing both the second and third rows and the 
second and third columns. The new stationary distribution is Switch(v) 
and so the new payoff to X is Switch(v) T Sx = Switch(y) T Switch^Sy) = 
v T Sy = sy. Furthermore, it is easy to check the following. 

Proposition 2.5 Assume p is the X Press-Dyson vector for the strategy 
p. If Y uses q = p then q = Switch(p) is the Y Press-Dyson vector. In 
particular, if p = aSx + /3Sy + 7I then q = f3Sx + ctSy + 7I. 

□ 

We begin our use of this new notation with some elementary observa- 
tions. Notice that the association between p and p is affine and so a convex 
combination of strategy vectors is associated with the corresponding convex 
combination of Press-Dyson vectors. Now assume that p is the X Press- 
Dyson vector for a strategy p. 

• p = iff p = (1, 1,0,0), i.e. p = Repeat, p is lazy iff pi — for at 
least three indices i. 

• The first and second coordinates of p are non-positive and the second 
and third are non-negative. These are the sign constraints on an X 
Press-Dyson vector. In addition, \pi\ < 1 for % — 1, ..,4. These are the 
size constraints. Any vector in M 4 which satisfies both the sign and 
the size constraints is an X strategy Press-Dyson vector. Call p a top 
strategy for X if \pi\ = 1 for some i. For any strategy p, other than 
Repeat, p = k(p t ) + (1 — k) Repeat for a unique top strategy vector p* 
and a unique positive k < 1. Equivalently, p = kp f . 

• p is agreeable iff pi — and is firm iff p 4 = 0. The X Press-Dyson 
vector for Tit-for-Tat is p = (0, -1, 1, 0). 

We emphasize the top strategies because multiplying p by a constant does 
not affect the Press-Dyson equations: 

p = k(aS x +l3S Y +ll) with A; > =^ as x +l3s Y +l = 0. (2.8) 

In order to study the ZDS's it will be helpful to normalize in various ways. 
When T > S we can add a common constant to all the payoffs and multiply 
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all by a common positive number without changing the relationship between 
the various strategies. We can thus assume that T = 1 and S = 0. 

Recall that for the Prisoner's Dilemma the payoffs are assumed to satisfy 



T > R > P > S and 
So we will assume 

T = 1, S = and so 1 > R > 
The payoff vectors of (11.2j) become 



2R > T + S. 



1 

2' 



-, R > P > 0. 



(2.9) 



(2.10) 




1 

Now assume that X uses a ZDS with p = 
sign constraints we have 

(a + P)R + 7 < 

P + 7 < 
a + 7 > 
(a + /3)P + 7 > 



(2.11) 



1 


aSx + /3Sy + 7I. From the 



0. 
0. 

0. 

0. 



(2.12) 



Subtracting the fourth inequality from the first we see that (a + /3) (R — 
P) < and so R — P > implies a + < 0. Then the fourth inequality and 
P > imply 7 > and then the first and fourth imply a + /3 = 0iff7 = 0. 

This leads to the exceptional strategies 



1 > a > 0, 



-a, 



7 = 0. 



x 



a 




-1 
1 




(2.13) 



a(l, 0,1,0) + (1 -a)(l, 1,0,0) 
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With a = 1 this is the top strategy Tit-for-Tat while with 1 > a > it is a 
mixture of Tit-for-Tat and Repeat. 
Now assume 7 > and we define 

a = a/ 7 , = 0/7- (2.14) 

with the sign constraints 

-P- 1 < a + P < -R-\ 

2.15 

(3 < -1 < a. K ! 

For any pair (a, (3) which satisfy these inequalities we obtain a vector p which 
satisfies the size constraints as well by using 7 > small enough. The largest 
value that can be chosen is 

7 = [max(-(a + (3)R - 1,-/3-1, a + 1, (a + /3)P + 1)]" 1 (2.16) 

which yields the top strategy with the pair (a, (3). 

The points (a, (3) lie in the ZDS strip which consists of the points of the 
xy plane with y < — 1 < x and which lie on or below the line x + y = —R~ l 
and on or above the line x + y = —P^ 1 . Since | < R < 1, the point (—1, — 1) 
lies below the line x + y = —R~ l and the points (—1, 0) and (0, —1) lie above 
it. The point (—1, —1) lies on or above x + y = —P^ 1 , and so is in the strip, 
iff P < |. In that case, the top strategy associated with (a, (3) = (—1, — 1) 
is given 

p = (2(l-i?),l,0,(l-2P)). (2.17) 

We call this the Vertex strategy. If P = | then this strategy is firm and lazy 
clS clX6 all mixtures with Repeat. 

Together with the exceptional strategies the ZDS's on the line x + y = 
—R' 1 are exactly the agreeable ZDS's (j>\ = 0) and, together with the ex- 
ceptional, those on the line x + y — — P _1 are exactly the firm ZDS's. 

Now we complete the proof of Theorem 12.41 by showing 

Proposition 2.6 Except for Repeat with p = 0, the only case when a ZDS is 
lazy occurs when P = \ in which case (d,f3) = (—1, —1) yields the only lazy 
strategies, mixtures of Vertex and Repeat. These have X Press-Dyson vectors 
p = (pi, 0,0,0) with pi < 0. Nonetheless in these cases, the Press-Dyson 
equation 

s x - s Y +1 = (2.18) 
is satisfied for any strategy q for Y and for any initial plays by X and Y. 
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Proof: The strategy p is lazy when three or four on the entries of p are 

0. 

For the exceptional cases, P2 < 1 and p% > 0. Hence, p 2 < and p 3 > 
and so these are not lazy. 

If p = 7(aSx + /3Sy + 1), then pi — when (a, (3) lies on the line 
x + y = —R^ 1 while p^ = when (a, p) lies on the line x + y — —P~ x . Since 
R > P, these do not happen simultaneously. 

i>2 = P3 = iff (a,/3) is the point (—1,-1). Since R > | the line 
x + y = — lies above this point. So at this point pi < 0, i.e. pi < 1. We 
get P4 = p4 = when the line x + y = — passes through this point and 
so when P = ~. 

Thus, the only lazy possibility for a nonzero p occurs when P = \ and 
(a, p) = (—1,-1). Then p = (pi, 1,0,0) with < 1. For any strategy q 
for Y, there is a terminal set contained in the closed set C = {dc, dd}. Since 
Pi < 1 there is always and edge from cc into C. Hence, cc is always transient. 

The only possibility of a terminal set which cannot be removed via the 
M construction of the Lemma [2.1( b). i.e. of a terminal set disjoint from C, 
is fixation at cd which occurs when q% = 0. See the Remark after Lemma |2~T1 
However for the terminal set J = {cd}, vj = (0, 1,0,0) and so vJSx = 
and vJSy = 1. So in this case as well avJSx + /3vJSy + 1 = — 1 + 1 = 0. 
Thus, we can proceed as we did before with ( 12.7ft to complete the proof. 

□ 

Now we are ready to describe the set S. These are the strategies with 
(a, p) on the line x + y = —R^ 1 and with a positive. 

Definition 2.7 Given a > the associated sharp strategy has probability 
vector p given by 

(1,^,1,^^). (2.19) 
a + 1 a + 1 

A strategy is in the collection S when it is a mixture of a sharp strategy and 
the Repeat strategy with nonzero weight on the sharp strategy. 



Notice that 1 < < 2 and < P ■ R~ x < 1. The sharp strategies, like 
Tit-for-Tat, respond to an opponents play of c with a play of c. In particular, 
they are agreeable. However, with positive probabilities depending on a they 
will cooperate after an opponent played d, using 1 ~f f i | — in the dd case and 
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2 T^ 1 1 for the remaining outcome. Thus, the sharp strategies are versions 
of what is sometimes called Generous Tit- for- Tat. Notice that when mixed 
with Repeat the strategies remain agreeable but then p 3 < 1. 

Proposition 2.8 A strategy for X is in § iff it is a ZDS with p = ^(dSx + 
j3Sy + 1) satisfying 7, a > and a + (3 = —R^ 1 . In particular, § is a convex 
set of strategies. 

Proof: The stated conditions on p say that with 7, a > 

p = ( 0) - 7 (a + J R- 1 -l),7(« + l),7(l-P. J R- 1 )). (2.20) 

With a fixed the largest value for 7 so that the size constraints hold is 
(a + l) -1 . This is the sharp strategy with a. The ones with < 7 < (a + l) -1 
are mixtures with p = which is the Repeat strategy. 

Clearly the conditions on p are preserved by convex combination. It 
follows that the set S is convex. 

□ 

Since Repeat is (1, 1, 0, 0) we clearly have: 

peS pi = 1, 1 >p 2 > 0, p 3 > 0, 1 >p 4 > 0. (2.21) 

Before proceeding to the main results we note the following. 

Lemma 2.9 Assume that p 6 §. If Y plays q against p with q± = 1 and 
<?3 + <?4 > then {cc} is the unique terminal set for the associated Markov 
matrix M. 

Proof: Since p and q are both agreeable, {cc} is a terminal set for M. 

Since p 3 ,P4 > 0, there is an edge from dc to either cc or cd and from dd 
to cc or cd. It remains to show that cd is transient. 

If q 3 > 0. Then p 2 > implies there is an edge from cd to cc and so cd is 
transient. 

If q3 = then 1 > p 2 implies there is an edge from cd to dd. Furthermore, 
g4 > since q% + q\ > by hypothesis. Hence, p^ > implies there is an 
edge from dd to cc. So in this case as well, dd and hence cd are transient. 

Thus, cc is the only recurrent state and {cc} is the unique terminal set. 

□ 

We observe that R is the maximum average payoff. 
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Proposition 2.10 For any pair of programs for X and Y 



^{s Y + s x ) < R. 



(2.22) 



In particular, if sy > R then sx < R- 



Proof: Observe that R is the largest entry of 




1 

2 



(2.23) 



2 




consequently on every round the average of the two payoffs is bounded by R. 



Now we arrive at the main results. 

Theorem 2.11 Assume that X uses the Tit- for- Tat program or more gen- 
erally, an exceptional strategy with initial play c. 

(a) If Y chooses an agreeable program then the outcome sequence is imme- 
diately fixed at cc and, a fortiori, the long term payoffs satisfy sx = 
Sy = R. 

(b) For any program for Y sx = sy and so if sy > R then sx = sy = R. 

Proof: An exceptional strategy with initial play c is an agreeable pro- 
gram and so (a) is clear. 

(b) The exceptional strategies are ZDS's with a = —f3 > and 7 = 0. 
By Theorem 12.41 the Press-Dyson equation says that for any program for Y, 
sx — $y = 0. If sy > R then by Proposition 12.101 sx < R- Hence, sx = s Y 
implies that both equal R. 



Theorem 2.12 Assume that X chooses a strategy vector p from S. 



□ 



□ 
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(a) If Y chooses a strategy vector q with q\ — 1 and q 3 + g 4 > i/ien i/jen 
i/ie associated Markov matrix M zs convergent with {cc} the unique 
terminal set. The long term payoffs are equal at the cooperative level, 
i.e. sx = sy = R- In particular, the choices for Y include the strategies 
in S and the exceptional strategies. 

(b ) For any strategy choice for Y and choice of initial plays for X and Y 
sy > R implies sy = sx = R- 

Proof: (a) By Lemma [2791 the Markov matrix M is convergent with {cc} 
the unique terminal set. Hence, the unique stationary distribution v T = 
(1,0,0,0). Hence, s x = v T S x = R and s Y = v T S Y = R. All of the ZDS 
strategies strategies with q = g(bSx + aSy + 1) such that a + b = —R~ l are 
agreeable and satisfy g 4 > 0. In particular, this includes all the strategies 
in §. The exceptional strategies, including Tit-for-Tat, are agreeable with 
<?3 > 0. 

(b) By Theorem 12.41 the Press-Dyson equation for the ZDS p implies 
asx + fisy — — 1. The S strategy p satisfies a + (3 = —R^ 1 . Substituting for 
/3 and multiplying by —1 we obtain 

irV + a{s Y - s x ) = 1. (2.24) 

If Sy > R then Proposition 12.101 implies sx < R and so sy — sx > 0. 
Because, a > for p in S 

s Y > R => iTV > 1 and a{s Y - s x ) > 0. (2.25) 

Now equation f)2.24p implies i? _1 sy = 1 and a(s Y — sx) = 0. Using a > 
again we get sy = R and sy = sx- 
□ 

As we will later see, Y can choose a strategy so that s Y > sx but (b) 
says that this can only happen when sy < R. In fact, in that case it always 
happens. 

Addendum 2.13 Assume that X plays a strategy from $, then for any strat- 
egy for Y and choice of initial plays 

s Y < R s x < s Y < R. (2.26) 

In particular, either both payoffs equal R or both are less than R. 
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Proof: sy < R says that R~ x sy < 1. So equation ( I2.24p implies that 
a(sy — sx) > 0. Since a > 0, sy > sx- 

By Theorem 12.12( b) sy > R cannot happen, and so either both payoffs 
equal R or by (12.261) both are less than R. 

□ 

Remark: If X plays an exceptional strategy then by Theorem 12.111 always 
sy — Sx < R and so again, either both payoffs equal R or both are less than 
R. 

Thus, in the language of Definition 11.31 Tit-for-Tat and the strategies in 
S are good. 

At the edge of the line of strategies in S lie the strategies with a = and 
so p = j(— R^Sy + 1). So the top strategy is given by p = (1, 2 — R~ l , 1,1 — 
R~ X P). This is an example of a type of special strategy considered by Press 
and Dyson and earlier by Boerlijst, Nowak and Sigmund (1997), who called 
them equalizer strategies. The equalizers are on the vertical line a — 0. Each 
fixes the opponent's payoff. If both players choose equalizer strategies on the 
line x + y = —R^ 1 then the joint cooperative payoff is achieved. While there 
is no incentive for either player to move away when both are using them, i.e. 
such a pair is a Nash equilibrium, Y also no incentive to choose an equalizer 
strategy when X does. 

In the other direction, we can move along the line x + y = —R" 1 letting a 
tend to infinity. We then approach the exceptional strategies. As a tends to 
infinity in (12. 19[) the sharp strategy approaches p = (1, 0, 1, 0) = Tit-for-Tat. 

At the end of the introduction we pointed out that if p and q are agreeable 
strategies with convergent Markov matrix M then {cc} is the unique terminal 
set and so starting from any of the three transient outcomes {cd, dc, dd} we 
move along a sequence of states which hits cc with probability one. It is easy 
to compute the expected number of steps T z from transient state z to cc. 

T z = 1 + H z , Pzz ,T zl , (2.27) 

where we sum over the three transient states and p zz i is the probability of 
moving along an edge from z to z' . Thus, with M' = M — /, we obtain the 
formula for the vector T = (T2, T3, T4): 

MJ-T = -1. (2.28) 
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where M£ is the invertible 3x3 matrix obtained from M' by omitting the 
first row and column. 

Consider the case when X and Y use the same sharp strategy, p = q = 
(1,P2, 1,P4)- The only edges coming from cd connect with cc or with dc and 
similarly for dc. Symmetry will imply that T cd = T dc . So with T this common 
value we obtain from (I2.27P T = 1 + (1 — p-zjT. Hence, from (I2.19P we get 

T = T cd = T dc = - = -^+* (2.29) 

p 2 2 - R 1 

Thus, the closer the strategy is to the equalizer strategy with a = the 
shorter the expected recovery time from an error leading to a dc or cd out- 
come. From (12.271) one can see that 

T dd = 1 + 2p 4 (l-p 4 )-T + (l-p 4 ) 2 -T dd . (2.30) 

We won't examine this further as arriving at dd from cc implies errors on the 
part of both players. 

Of course, one might regard such departures from cooperation not as 
noise or error but as ploys. Y might try a rare move to cd in order to pick 
up the temptation payoff for defection as an occasional bonus. But if this 
is strategy rather than error, it means that Y is departing from the sharp 
strategy to one with qi a bit less than 1 and so which is no longer agreeable. 
As we will see in the next section, moving below the x + y — —R' 1 line 
may allow Y to do strictly better than X assuming that X stays high, but 
Theorem 12.121 implies that in that case Y's payoff also decays below R and 
so Y loses as well by executing such a ploy. 

We conclude by noting that the results of Theorems 12.111 and 12.121 are 
still true if Y responds to an X strategy in S with a longer memory strategy. 
That is, Y responds using not just the previous outcome but the N previous 
outcomes for some N > 1. The states of the resulting Markov chain are not 
single outcomes but sequences of iV outcomes. In Appendix A of Press and 
Dyson (2012) the authors show that when this larger Markov chain has sta- 
bilized, the payoffs sx and sy are the same as those which the players would 
have received had Y adopted a certain 1 step Markov strategy against X's 
original 1 step strategy. Thus, the Press-Dyson equations and the conclusions 
of Theorems 12.111 and 12.121 continue to hold. 
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3 Competing Zero Determinant Strategies 



Now let us consider what happens when both players use a non-exceptional 
ZDS. 

Lemma 3.1 For (d,j3) in the ZDS strip, —(3 > 1 and —(3 > \d\ with —j3 = 
\d\ iff a = f3 = — 1. // (a, b) is also in the strip then D = f3b — da > with 
equality iff d = [3 = a = b = — 1. 

Proof: d + (3 = —z^ 1 with P < z < R and so — /3 = d + z~ x > a. Also, 
the sign constraints imply — (3 > 1 > —a, and so — (3 > —d with equality iff 
a = (3 = —1. D > (—/3)(—b) — \a\\a\ > and the inequality is strict unless 
a — j3 — a — b — — 1. 

□ 

Now we compute what happens when X and Y use ZDS strategies associ- 
ated, respectively, with points (a, [3) and (a, b) in the ZDS strip. This means 
that for some g > 0, p = r y(dSx + (3Sy + 1). On the other hand, for the Y 
Press-Dyson vector we apply Switch and so q = gibSx + aSy + l)- See Propo- 
sition 12.51 We obtain two Press-Dyson equations which hold simultaneously 

ds x + (3s Y = -1, ^ ^ 

bsx + asy = — 1. 

Hd = f3 = a = b = — 1, i.e. both players use the Vertex strategy, then the 
two equations are the same. In that case, since the two players use the same 
strategy, sx = Sy- Then the single equation of (13. ip yields sx — sy — \- 
Recall that the Vertex strategy is only defined when P <\. 

Otherwise the determinant D = (3b — da is positive and we get 

s x = D-\a - (3), s Y = D-\d - 6), 
and so s Y - s x = -D _1 [(a + /3) - {a + &)]. 

Notice that sx and sy are independent of 7 and g. 

Proposition 3.2 Assume that p = ^(dSx + f3Sy + 1) and q = g(bSx + 
aSy + 1). 

(a) The points (d,(3), (a, b) lie on the same line x + y = —z^ 1 for some z 
with P < z < R iff sx = sy. In that case, the common value is z,i.e. 
s x = S y = z. 
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(b) s Y > s x iff(a + p) > (a + b). 

(c) Now assume that (a + j3) > (a + b). 

s x < + and s Y > -{a + b)' 1 . (3.3) 



and 



a > 
a > 



a < 
a < 



Sy < 
S X > 



s Y > 
s x < 



-{a + P)- 1 
-{a + b)' 1 . 



(3.4) 



[a 
(a 



b)-\ 



(3.5) 



Proof: (a) Assume a + j3 = d + b. From f)3.2p we see that sy — sx = 0. 

When sx = sy then the Press-Dyson equations (13. 1 p imply asx + (3sx = 
asx + (3sy = — 1 and so the common value is — (a + /3)" 1 . Similarly, the 
common value is — (a + b)^ 1 . Hence, a + (3 = a + b and the points lie on the 
same line. 

(b) When both players use Vertex, the two points lie on the same line 
and sx = sy. Otherwise, D > 0. Then (b) follows from (13. 21) . 

(c) From (b), 1 > sy > sx > 0. This excludes the D = case and so 

r =def sx/sy = (a - 0)/(a - 6) and s = def r _1 = s Y /s x (3.6) 

satisfy oo>s>l>r>0. Notice that sx — iff the inequalities a > — 1 > 
j3 are all equalities. 

Substituting sx = rsy in the original equations ( 13. ip we get 

s Y = -{ra + PY 1 = -{rb + a)-\ (3.7) 

and if > 

s x = -{a + spy 1 = -{b + sd)- 1 . (3.8) 

Because b, (3 < 0, rb + a > a + b and a + s(3 < a + (3 which imply (I3.3P except 
when sx = for which it is obvious. 
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The proofs for (13. 4p and (13. 5p are just the same. Notice that a > implies 
s x > 0. 
□ 

The result in Proposition 13.2( b) is puzzling because it says that the player 
on the lower line gets the larger payoff. The cooperative strategies in S all lie 
on the highest line. This means that starting on the line x + y = —R^ 1 with 
X playing a strategy in S, Y can move to a lower line and then the payoffs 
will be such that sy > sx- That is, having moved away from the S strategies 

Y does better than X. As (13.31) indicates, sx is lower than the cooperative 
value —(a + /3) _1 = R. However, because a > for a strategy in S, (13.41) 
implies sy < R as well. Thus, Y's move away from the S line of strategies 
causes the payoff for X to decay from R but the payoff to Y does so as well. 
Compare Addendum 12.131 

Since Axelrod's original tournaments, a great deal of interest has been 
focussed on effects of repeated, competitive play among a population of 
strategies. Much of this work has focussed on numerical simulation, see, 
e.g. Stewart and Plotkin (2012) although there has been analytic work as 
well, see Hofbauer and Sigmund (1998) Chapter 9. So we turn now to the 
dynamics of such competition. 

The dynamics that we consider takes place in the context of a symmetric 
two-person game, but generalizing our initial description, we merely assume 
that there is a finite set of strategies indexed by J. When players X and 

Y use strategies with index i,j 6 J, respectively, then the payoff to player 
X is given by Aij and the payoff to Y is A^. Thus, the game is described 
by the payoff matrix {A^}. We imagine a population of players each using 
a particular strategy for each encounter and let 7Tj denote the ratio of the 
number of i players to the total population. The frequency vector {iii} lives 
in the unit simplex A C M 3 , i.e. the entries are nonnegative and sum to 1. 
The vertex v(i) associated with i e J corresponds to a population consisting 
entirely of i players. We assume the population is large so that we can regard 
7r as changing continuously in time. 

Now we regard the payoff in units of fitness. That is, when an i player 
meets a j player in an interval of time dt, the payoff is an addition to 
the reproductive rate R of the members of the population. So the i player is 
replaced by 1 + (R + Aij)dt i players. Averaging over the current population 
distribution, the expected relative reproductive rate for the subpopulation of 



26 



i players is R + Am, where 

A ln = E ie3 ttj Aij and g ^ 

The resulting dynamical system on A is given by the Taylor- Jonker Game 
Dynamics Equations see Taylor and Jonker (1978) and Akin (1990). 

cItt ■ 

— - = ITiiA^ - Anr). (3.10) 

at 

Observe that for each % the vertex v(i) representing fixation at the % strategy 
is an equilibrium for all i. 

This system is one of the examples of the replicator equations studied in 
great detail in Hofbauer and Sigmund (1998). 

To apply this to our case, we suppose that J indexes a finite collection 
of Markov programs, i.e. a strategy vector p J and an initial play, pure or 
mixed. We then use 

= sx so that Aji = sy. (3-H) 

That is, when the X player uses the i program and the Y player uses the 
j program then the players receive the payoffs sx and sy as additions to 
their reproductive rate. In the case that the associated Markov matrix is 
convergent, there is a unique terminal set, and the long term payoffs, sx, sy 
are independent of the initial plays. 

The programs given in Theorems 12.111 and 12.121 lead to locally stable 
equilibria. 

Theorem 3.3 Assume that 

(i) For some i* G J the associated strategy is good. In addition, the i* 
program uses initial play c, or the Markov matrix is convergent when 
both players use p J *. So the payoff A^i* = R. 

(ii) For all j ^ i* in J, if X uses the i* program and Y uses the j program 
then sy < R. That is, Aji* < Ai*i* . 

The equilibrium v{i*) is an attractor, i.e. a locally stable equilibrium. In 
fact, there exists e > such that 

diTj* 

1 > 7Tj* > 1 — 6 =► -p > 0. (3.12) 

27 



Thus, near the equilibrium v(i*) given by iti* = 1, 7i> increases monotoni- 
cally, converging to 1 and the alternative strategies are eliminated from the 
population in the limit. 

Proof: We are assuming that for all j ^ i*, Aji* < Ai*i*. 
It then follows for e > sufficiently small that for 7r £ A, p^ > 1 — e 
implies Ai*^ > Aj n . If also 1 > Pi*, then Ai* n > A W7T . So (13. 10p implies 

dsn. 

□ 

Remarks: The condition > Ajj* for all j ^ i* says that i* is an 

evolutionarily stable strategy as defined by John Maynard Smith. The local 
stability given above holds in general for ESS's. So in that sense that the 
goodness of the i* strategy was superfluous. However, if p** is good then 
the only cases that we are excluding in (ii) lead to degeneracies which we 
will describe below. That is, suppose that p J were a strategy which when 
played against p l * obtains a payoff sy > R- Because p J * is good, we have 
sx — sy = R, i-e. Aji* = Ai*j = R. 

Thus, the dynamics provides additional support for the use of the S strate- 
gies. 

To investigate the dynamics further, we will analyze the case when all 
the strategies indexed by J are ZDS's with the exceptional strategies and 
the Vertex strategy excluded. We can thus regard J as listing a finite set of 
points (di, /3i) in the appropriate region. X uses p associated with (c^, /3A 
when p = 7j(«jSx + /3jSy + 1) and Y uses q associated with (<x,-, (3j) when 
q = 7j(/3jSx + ajSy + 1) for some 7i,7j > 0. Notice the XY switch. Thus, 
we apply ( 13. ip with (a, (3) = (di,(3i) and (a, b) = (dj,(3j). Then from (13 . 2[) 
we get 

A H = s x = Kijidj - 
with Kij = Kji = (PiPj — didj)" 1 > 0. 

Note that the payoffs are independent of the choice of 7i,7j. 
We begin with some degenerate cases. 

First, if all of the points (cij,/3j) lie on the same line x + y = —z^ 1 
then by Proposition 13.2( a) Aij = z for all i, j and so % = and every 
population distribution is an equilibrium. In general, if for two strategies 
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i,j Aij = Aji = z then by Proposition 13.2( a) both points lie on x + y = —z~ l 
and it follows that An = Ajj = z as well. The dynamics is degenerate on the 
face the every member of the population uses one of these strategies. 

Second, if all of the points satisfy en = then all the strategies are 
equalizer strategies. In this case the payoff matrix need not be constant but 
A^ depends only on j. This implies that for all i A in = A n7T and so again 
% = and every population distribution is an equilibrium. 

We will now see that the line at = separates different interesting dynamic 
behaviors. 

Theorem 3.4 (a) Assume that for some i* G J 

ai* + 0i* > dtj + (3j for all j ^ i* . (3.14) 

If o.i* > then the equilibrium v(i*) is an attractor, i.e. it is a locally 
stable equilibrium. In fact, there exists e > such that 

1 > n t * > 1-e =^ -p > 0. (3.15) 

Thus, near the equilibrium v(i*) given by = 1, 7Tj» increases monotoni- 
cally, converging to 1. 

If cti* < then the equilibrium v(i*) is a repellor, i.e. it is a locally 
unstable equilibrium. In fact, there exists e > such that 

1 > 7Ti* > 1-e => — - < 0. (3.16) 

Thus, near the equilibrium v(i*) given by 7i> = 1, 7^* decreases monotonically 
until the system enters, and remains in, the region where tt^* < 1 — e. 
(b) Assume that for some i** G J 

cti** + A** < ctj + (3j for all j ^ i*. (3-17) 

If cti** < then the equilibrium v{i**) is an attractor. There exists e > 
such that 

1 > TXi** > 1-e =► ^p > 0. (3.18) 

If cti** > then the equilibrium v(i*) is a repellor. There exists e > 
such that 

dir.** 

1 > Hi** > 1-e => < 0. (3.19) 

Lib 



29 



Proof: (a) Suppose di* + fii* = —z~ x . When both players use the i* 
strategy they receive the common payoff of z. See Proposition 13.2( a). 

Assume ati* > If one player moves to a j strategy with atj + f3j on a 
lower line then from (13. 3 p and H3.4[) of Propostior j3.2( b) both players obtain 
a strategy less than z. That is, for all j ^ i* 

Ai*i* > max(Ai*j,Aji*). (3.20) 

Just as in Theorem I3.3[ we then obtain an e > o such that (13.151) holds. 

If, instead oti* < and player Y moves to an alternative j strategy, then 
by (13. 3p it is again true that the payoff sx is smaller than z. But now (13 .50 
implies that sy > z. Thus, for all j ^ i* 

Aji* > Ai*i* > Ai*j. (3.21) 

Hence, there exists an e > such that 7i> > 1 — e implies > Ai* n for all 
j ^ i*. Averaging we obtain A nn > Ai* n when 1 > 7i> > 1 — e. Then ( 13 . 1 6[) 
follows. It implies that the system cannot leave the region where 7i> < 1 — e. 

The proof for (b) is completely analogous. Here we apply (13. 3j) . (I3.4|) and 
( 13. 5 p with X and Y switched to get 

a>i** > 
oti** < 

□ 



min((Ai**j, Aji**) > Ai**i**, (3 22) 



Remarks: 1- The S strategies lie on the highest line and satisfy a > 0. 
So the first part of (a) applies to them. This is a special case of Theorem 

2- If there is a proper subset T of strategies on the highest line and all 
with on > then on the face of A where 7i> = £.; e 3»7rj equals 1 the dynamic 
is degenerate and for e > small enough, 1 > ttj* > 1 — e implies ^jjp > 0. 

It follows that the local stability of an S strategy need not be global. 
To illustrate this, consider the case of two strategies indexed by J = {1,2}. 
Letting w — 7Ti it is an easy exercise to show that (13.101) reduces to 

^ = w(l-w)[(A n -A 21 )w + (A 12 -A 22 )(l-w)}. (3.23) 
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Corollary 3.5 Assume that cti + Pi > ct 2 + P2 a> n d that oti ■ a 2 < 0. There 
is an equilibrium population containing both strategies with 

w/(l-w) = (A 22 -A 12 )/(A n -A 21 ). (3.24) 

This equilibrium is stable if 6l\ < and is unstable if ct\ > 0. 

Proof: If ai < and a 2 > then (EOT!) and fl3T22|) imply that A u -A 21 < 
and A12 — A 22 > and reversing the signs reverses the inequalities. The 
result then easily follows from equation ( 13 . 23j) . Just graph the linear function 
of w in the brackets and observe where the result is positive or negative. □ 

Remark: In particular if strategy 1 is in § and ct 2 < 0, then both vertices 
are attractors and the domains of attraction in the interval w £ [0, 1] are 
separated by the unstable equilibrium given by (I3.24p . 

Under other circumstances it is possible to get global stability. 
Theorem 3.6 Assume that for some i* £ J and for all j 7^ i* in 3 



ati* + Pi* > acj + Pj, 
and 6tj > a>i* > 0. 



(3.25) 



Then 

• For all k 7^ i* in J and for all j £ J 

Ai*j > A kj . (3.26) 

• Any population which contains i* strategists moves to fixation at the i* 
strategy. In fact, 

diii* 

< 7T,* < 1 =► — - > 0. (3.27) 

(JjL 

Proof: This is a direct computation. For, i,j, k £ J, 

Aij ~ A kj = f ~ h _ - '\; - % . (3.28) 
PiPi - <\,(\ J p k Pj - <\,,(\ , 
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The common denominator is positive and the numerator is 

a^Mi + fak ~ Mi ~ hoi] (3-29) 
By hypothesis ocj > and the expression in brackets can be rewritten as 
(a k - on - l$j)[{pn + A) -{oik + fik)] ~ Pj{6ik - a-i). (3.30) 

Since j3j < 0, this is positive when i = i* by the assumptions we have made. 
This proves (I3.26p . It implies that Ai* n > for all tc e A. Hence, if 

7Tj* < 1 then Ai*^ > A wn . 
□ 

Remark: The inequalities ( I3.26P say that the i* dominates all of the 
other strategies. It was just such domination in the original game which 
drove the rational players to the dd outcome. Here it is a cooperative strategy 
such as the ones in § which is dominating a wide class of alternatives. 

Question 3.7 Suppose we restrict to the case where J indexes ZDS's lying 
on different lines x + y — —z^ 1 to avoid degeneracies. We ask: 

• How large a population can coexist? If iV is the size of J, the number 
of competing strategies, then for what iV do there exist examples with 
an interior equilibrium, that is, an equilibrium 7r such that 7Tj > for 
all j e J? When is there a locally stable interior equilibrium? For how 
large an N can permanence occur (see Hofbauer and Sigmund Section 
3), that is, where the boundary of A is a repellor? The Brouwer Fixed 
Point Theorem implies that such a permanent system always admits 
an interior equlibrium. When an interior equilibrium does not exist 
there is always some sort of dominance among the mixed strategies of 
the game {Aj}- See Akin (1980) and Akin and Hofbauer (1982). 

• Can there exist a stable, closed invariant set containing no equilibria, 
e.g. a stable limit cycle? 

There is alternative version of the dynamics which explicitly considers 
for X not the payoff sx but the advantage that X has over Y. That is, the 
addition to the growth rate is given not by sx but by the difference sx — sy- 
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This amounts to replacing A^ by the anti-symmetric matrix = A^j — Ajj 
so that the game becomes zero-sum. In this case, we define & = — (a, + fa) 
so that £i varies in the interval [-R -1 , P^ 1 ]. From (13. 13[) we get 

Sy = A-,,(C - Q. (3.31) 

Since {Sij} is antisymmetric, S n7T = 0. 

As £ increases we are moving toward a lower line and this is what occurs. 
The system moves toward the lowest line, that is, toward the lowest joint 
payoff which occurs when £j is at its maximum. 

Theorem 3.8 Define £ n = T, ie j 7Tj£j. For the system with 

— ^- = 7Ti ( Si n S nn ) = TTiSi n . (3.32) 

we have on A 

dt ~ (3.33) 

mt/t equality iff ftij'Xj > =>- & = 

■^7* 7^ J implies 7^ £j, i.e. distinct strategies lie on different lines, 
then the system converges to the vertex v(i*) where £;* is the maximum value 
among the strategies initially present. 

Proof: Because Ky is symmetric and positive, ^ equals 
2 



h^ijeo -i-jK^CJ-C - 0) - ^ije a A'o<,(C - &)] (3.34) 



= -s^, t-i-jh'jjix, ^jf > 0. 

The final convergence result requires a bit of technical detail which I will 
merely sketch. 

By restricting to a suitable face if necessary, we may assume that our 
initial position was in the interior of A with all strategies present in the 
population. The set of limit points for the solution path of the system is 
a connected set of points on which %■ = 0. Since the £j's are distinct this 
occurs only at the vertices and so the solution path converges to a vertex. 
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This does not exclude the possibility that it converges to a vertex v(j) with 
an intermediate value £j. However, the vertex v(j) is a hyperbolic fixed point 
for the system with stable manifold the face defined by the vertices v(i) with 
£i — £j and with unstable manifold the face defined by the vertices with 
£,k > £j- The local behavior of such hyperbolic fixed points ensures that no 
solution path outside the stable manifold will converge to v(j). Hence, all 
the interior paths approach the attractor v(i*). 
□ 

Remark: Notice that if were replaced by 1 in ( 13 .32 j) then by replacing 
£i> by & — ^,r, — £,r and expanding out we get that the rate of increase of 
the mean of {^} is exactly its variance. 



4 Good Strategies, In General 

Finally, we move beyond the ZDS types in our search for good strategies. 
We begin by extending the Press-Dyson Equations. Define 

/(A 



(4.1) 



w 

and for any distribution vector v we define 

f x = v T L = v 2 + v 3 . (4.2) 

Suppose that X and Y play strategies p and q and with a given initial 
distribution the resulting stationary distribution is v. It is obvious from the 
normalization (12. lip that 



In particular, 



~(s x + s Y ) 



s y 



sx 



viR + v x - + v A P, 
v 2 - v 3 . 



sy 



Vx 

sx 



> 



\V2 ~ V 3 \ 



v 2 



v 3 



sy - s x \, 
= Vx/2. 



(4.3) 



(4.4) 
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Lemma 4.1 Assume that v is the stationary distribution for the programs 
of X and Y. The following are equivalent. 

(a) \{s x + 8 Y ) = R. 

(b) s x = s Y = R. 

(c) v = (1,0,0,0). 
When these hold then v x = 0. 

Proof By ( 14.31) \{sx + Sy) = R and R> P, | imply v x = = and 
= 1. That is, (a) implies (c) and v x = 0. That (c) implies (b) and (b) 
implies (a) are obvious. 
□ 



Remark: That is, the average payoff is at its maximum exactly when 
both players receive the cooperative payoff and this occurs iff the stationary 
distribution is fixation at cc. In particular, {cc} must be a terminal set which 
requires that both players use agreeable strategies. 



It is easy to check that 
/ 



det 



R R 

1 1 
1 1 
P P 



l\ 

1 
1 



2R-2P > 0. 



(4.5) 



It follows that {Sx, Sy, 1, L} is a basis for M 4 . Hence, for any strategy vector 
p there are unique real numbers a, (3, 7, 5 such that the X player Press-Dyson 
vector 

p = aS x + f3S Y + 7I + SL, (4.6) 
Of course, the strategy is a ZDS iff 5 = 0. 



Theorem 4.2 Assume that p and q are Markov strategy vectors with asso- 
ciated Markov matrix M. Let a, j3,j,8 be such that p aSx + PS Y + 
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jl + Sh. If either M is convergent or p is not lazy then the Generalized 
Press-Dyson Equation 

as x + (3s Y + 7 + Sv x = 0. (4.7) 

is satisfied for any initial plays by X and Y. 

Proof: In the convergent case Proposition 12.21 implies the generalization 
of fl22]): 

as x + (3s Y + 7 + Sv x = D(p, q, aS x +(3S Y +-fl+5L) / D(p, q, 1). (4.8) 

Because the determinant is zero when two columns are equal, (j4.7p follows. 

If p is not lazy and q is arbitrary we extend to the non-convergent case 
by using Lemma 12.11 as in the proof of Theorem 12.41 Notice that if v = 
P/Vj + PjVj then v x = v T L = p/vjL + pjvjL. So the generalized Press- 
Dyson equation for M comes from averaging the equations for Mj and Mj 
as before. 

□ 



The sign constraints on the X Press-Dyson vector p are 

(a + (3)R + 7 < 0, 

? + + * < o, 

a + 7 + 5 > 0, v ' 

(a + P)P + 7 > 0. 

As before we get 7 > and 7 = iff a + (3 = 0. In which case, a > —5 > —a. 



Thus, a> \6\. p= (0 -a + 5 a + 5 







The top strategies are given by 

P = (l 26/(a + 6) 10) (5 > 0) 



P 



(4.10) 

1 (a + 5)/(a-S) Oj (5 < 0). 



By varying a and S and multiplying by a positive constant k < 1 to allow 
mixtures with Repeat, we achieve all strategies which are both agreeable 
and firm. This is a square with extreme points Tit-for-Tat (1, 0, 1, 0) and 
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the three lazy strategies (1,1,0,0) (Repeat), (1,0,0,0) and (1,1,1,0). The 
latter are the two top strategies which are lazy, agreeable and firm. 

Otherwise, we normalize as before, defining a = a/7, /3 = /3/j,5 = 5/-y. 
The sign constraints become 

-p-' < * + f < -u- 

and p < — 1 — < a. 

Thus,the pairs (a,/3) lie in the region of the xy plane between the lines 
x + y — —R^ 1 and x + y = —P^ 1 and with y < x. Again the agreeable 
strategies lie on the line x + y = —R^ 1 and the firm strategies on x + y = 
-P-\ 

The Generalized Press-Dyson Equation becomes 

as x + (3s Y + 1 + Sv x = (4.12) 

In particular, if p is agreeable so that (3 = —a — R^ 1 then 

R^sy + a(s Y - s x ) = l + 5v x . (4.13) 

Example 4.3 After A3, jjj) we noted that when two ZDS's play one another, 
the payoffs are independent of the multipliers 7, g ofp and q as long as these 
are positive. Equivalently, mixing the strategies with the Repeat = (1, 1,0,0) 
does not affect the payoffs. This is not true in general. 

Proof: For a strategy p let p 7 = (1 — 7)p + 7(1, 1, 0, 0) with < 7 < 1. 
For example, if q = (1,0, 1,0), Tit-for-Tat, then q 5 = (l,g, 1 — g, 0) which 
is still an exceptional ZDS. So if Y plays any q ff then sx = Sy. If X plays 
p7 = 7 (aS x + /3Sy + 1 + 5L) then by f T4TT2]) 

sx = s Y = -[l + Sv x ]/(a + P). (4.14) 

for all 7. However, we will see that v x may change when 7 does and so the 
payoffs may change when 5^0, i.e. when the X strategies are not ZDS. 
Notice that for a non-exceptional ZDS, p2 = —1,^3 = 1 requires — (3 — 1 = 
a + 1 which cannot happen when a + (3 is equal or close to — i? -1 . 

Let p = (— 1 + px, — 1, l,p±) and so p = (pi, 0,1,^4) with < p^ < 1 
and with pi < 1 but close to 1. Let p = (1,0, 1,^4). Since p 4 > these 
strategies are not exceptional and so, as observed above, we have 5^0. We 
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describe all the terminal sets of the eight different pairings in the following 
table where All = {cc, cd, dc, dd} and < 7, g < 1. 



X\Y 


q 


q 9 


P 


{cc}, {cd, dc} 


{cc} 


P 7 


{cc} 


{cc} 


P 


{cd, dc} 


AH 


P 7 


All 


All 



Thus, when p plays q there are two terminal sets and the matrix is not 
convergent. The remaining cases are convergent. Changing to p 7 (or to q 9 ) 
introduces an edge in the graph from cd to cc (resp. from dc to cc) and so 
cd and dc become transient. When p plays q the terminal set is {cd, dc} and 
so v x = 1. Changing to p 7 or to q 9 introduces edges to cc and to dd from 
within {cd,dc}. With All as terminal set, v is a positive vector and so now 
v x < 1. 

□ 

We use these results to find good strategies which are not ZDS. However, 
first we must close a loophole in Theorem 14.21 and show that the Generalized 
Press-Dyson Equation always holds when 7 > 0. 

Proposition 4.4 Ifp — -y(aSx + (3Sy + 1 + 5L) then the strategy is lazy if 
a = (3 = — 1 — 8 and either a + (3 = —R^ 1 or a + (3 = — P _1 . For each of 
these cases the Generalized Press-Dyson Equation holds for any strategy q of 
Y and any initial plays. 

Proof: The two cases yield p = (1,1, 0,^4) with p 4 > and p = 
(pi, 1, 0, 0) with pi < 1. We now proceed as in Proposition 12.61 The only sit- 
uations which cannot be handled by using Lemma 12.11 methods is a terminal 
set {cd} or {dc}. In each of these cases, sx + sy = 1 and v x = 1. Hence, 
a = (3 = — 1 — 5 implies (I4.12p as required. 

□ 

Theorem 4.5 Let p be a strategy with X Press-Dyson vector p = = 7(aSx+ 
/3Sy + 1 + Sit) and 7 > 0. Assume that a + J3 = — R~ x so that p is agreeable. 
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If a > 8 and > 8 then for any strategy q played by Y and any choice of 
initial plays, sy > R implies Sx = sy = R- That is, p is a good strategy. 



Proof: Because sy — sx = v 2 — v 3 (I4.13p is equivalent to 

R^sy + [a(v 2 - v 3 ) - 8(v 2 + v 3 )] = 1, i.e. 
R^sy + [(a - 5)v 2 - (a + 6)v 3 ) = 1. 



(4.16) 



Now if R l sy > 1 then by Proposition 12. 101 s x < R and so sy — sx > 0. 
Thus, v 2 > v 3 . Since (a — 5) > (I4.16P implies that 



1 > RT l s Y + [(a - 5) - (a + 5)]v 3 = ^"'sy - 2<fy 



(4.17) 



Since — 2<5t>3 > 0, R^sy > 1 implies R~ Y sy = 1 and <k>3 = 0. 

If 8 < then t> 3 = 0. Since a — 5 > 0, (14. 16ft implies t>2 = 0. Since 
= v 2 — v 3 = sy — sx, we have sx = sy = R. 

If 8 = then from (14. 13j) . a(sy — sx) = 0. Since a > 8 = 0, sy — sx = 0. 
Again, _R = sy = sx- 

□ 



Remark: By the sign constraints, if 8 > a then 28+1 > 0. Thus, if 
5 < — I the strategy p is good. 

On the other hand there are many agreeable strategies which are not 
good. 

Theorem 4.6 Let p be a strategy with X Press-Dyson vector p = 7(aSx + 
(3Sy + 1 + 8h) and 7 > 0. Assume that a + (3 = —R~ l so that p is agreeable 
but that 8 > a. If Y plays Defect, i.e. q = 0, then sx < P and 



sy > R if 8 > a, 
sy = R if 8 = a. 



(4.18) 



Proof: Since a + /3 = R 1 we have that 

/ 



7 • 



\ 



(-R- 1 + 1 + 8 - a 
{a + 8 + 1) 
(l-R-ip 



J 



(4.19) 
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Since q = the Markov matrix M of ( II. 4p reduces to 



M 



(° 


1 







\ 





P2 





(1 


-Pa) 





Pa 





(1 


-Ps) 


v° 


Pi 





(1 


-Pi) J 



(4.20) 



Note that P4 > 0, i.e. p is not firm and so {dd} is not a terminal set. There 
is a unique terminal set contained in {cd, dd}, either this entire closed set or 
{cd} if p 2 = 1. We have 



/ \ 



-=-[p 4 + (l-p 2 )]. 



(4-21) 



\I-P2) 



So s Y > R iff 

p 4 + (l-p 2 )P > \p 4 + (l-p 2 )]R, 



i.e. 



p 4 (l - i2) 



P*(l-i2) > (l-p 2 )(R-P) 



-P2{R-P) 



1-R- 1 P)(1-R) > (R- 1 - 1 - (5 - a))(R - P) i.e. 
(R-P)(l-R) > (l-R-(5-a))(R-P). 



(4.22) 



Since 1 > R > P this inequality holds and is strict iff (5 — a) > 0. sx = V4P 
which is less than P since 1 — v± = v 2 > 0. 
□ 



Comparing Theorem 14.51 and Theorem I4.6[ we see that the only agreeable 
strategies whose status remain undecided are those with a > 5 > 0. 1 
conjecture that none of these are good. This is supported by the following 
partial result which says that when 5 is large compared with the difference 
a — 5 > then Y has a simple effective response against p. 
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Theorem 4.7 Let p be a strategy with X Press-Dyson vector p = ^(aSx + 
(3Sy + 1 + Sh) and 7 > 0. Assume that a + (5 = —R^ 1 so that p is agreeable 
but that for some positive K: < a — 5 < K and 5 > \ + [RK/ (1 — R)} . If 
Y plays q = (0, 0,0,1) against p then sy > R and so sx < R- 

Proof: Since Y uses q = (0, 0, 0, 1) the Markov matrix is: 



/ 



M 



1 

P 2 
P3 









\p 4 1 - p 4 





1 -P2 
1 ~P3 





So 



M 



/ T 



1 






P2 - 1 





\0 I-P2 



P3 







Pi 



1 -p 4 

-1 



After several row operations we obtain the row equivalent matrix 



(\ 

p 2 - 1 P3 












-Pa 1 

P4 

"(1-P4) 



Thus, 



v = 



^ P4(l-P2) 

(P3 + (1 - Ps)P4 
(l-p 4 )(l-p 2 ) 

V (1-P2) 



(4.23) 



(4.24) 



(4.25) 



-[2(l-p 2 )+p3 + (l-p 3 )p 4 ] (4.26) 
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So Sy > R iff 

P4(l -P2)^+ (P4 + £>3 + (1 ~Pz)P > 

[Piil - pa) + (P4 +P3 -P3P4) + (1 -P4)(l - Pa) + (1 ~P2)]-R 

i.e. 

(Pa + (1 - p 3 )p 4 )(l - i2) > (1 - p 2 ) [(1 - Pa)R + R-P}. 



(4.27) 



Next, note that (p 3 + (1 - p 3 )p 4 ) > and 2i? > [(1 - p A )R + R - P}. We 
apply ( I4.19P and noting that 1 — p 2 = —p2,P3 = p3- We see that for sy > R 
it suffices that, 

(l-p 2 )2R 
Ps > 1 _ R , i-e. 

a + 5 + 1 > [R- 1 -l + (a-5)]-^-, or (4.28) 

1 — K 

a + 5 > 1 + (a - 5)- 



1-R 



So if < a - 5 < if it suffices that 5 > | + [RK/(1 - R)). 
□ 
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